SINGULAR VALUES OF GAUSSIAN MATRICES AND 
PERMANENT ESTIMATORS 



MARK RUDELSON AND OFER ZEITOUNI 



Abstract. We present estimates on the small singular values of 
a class of matrices with independent Gaussian entries and inho- 
mogeneous variance profile, satisfying a strong-connectedness con- 
dition. Using these estimates and concentration of measure for 
the spectrum of Gaussian matrices with independent entries, we 
prove that for a large class of graphs satisfying an appropriate ex- 
pansion property, the Barvinok-Godsil-Gutman estimator for the 
permanent achieves sub-exponential errors with high probability. 

1. Introduction 
Recall that the permanent of an n-hj-n matrix A is defined as 

per(A) = ^ ai,^(i)a2,7r(2) ■ ■ ■ an,7r(n) , 

where the summation is over all permutations of n elements. In this 
paper we consider only matrices A with non-negative entries. This in- 
cludes in particular matrices with 0-1 entries, for which the evaluation 
of the permanent is fundamental in combinatorial counting problems. 
For general 0-1 matrices, the evaluation of the permanent is a 
complete problem [18]. Thus, the interest is in obtaining algorithms 
that compute approximations to the permanent, and indeed a polyno- 
mial running time Markov Chain Monte Carlo randomized algorithm 
that evaluates per(A) (up to (1 + e) multiplicative errors, with complex- 
ity polynomial in e) is available |T0]. In practice, however, the running 
time of such an algorithm, which is 0{n^^), still makes it challenging 
to implement for large n. (An alternative, faster MCMC algorithm is 
presented in [3|, with claimed running time of 0(n^(logn)'^).) 

An earlier simple probabilistic algorithm for the evaluation of per(y4) 
is based on the following observation: if Xij are i.i.d. zero mean vari- 
ables with unit variance and X in an x n matrix with entries Xtj, 
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then an easy computation shows that 

(1.1) per(A) =E(det(Ai/2 0X))2, 

where for any two n x m matrices A,B, D = A Q B denotes their 
Hadamard, or Schur, product, i.e. the n x m matrix with entries 
di,j = o-ij ■ bij, and where Ai/2{i,j) = A{i,jY^'^. Thus, det(Ai/2 Q X)"^ 
is an unbiased estimator of per (A). This algorithm was proposed (with 
Xij G { — 1,1}) in [7j. Since the evaluation of determinants is com- 
putationally easy via Gaussian elimination, the main question is the 
approximation error, and in particular the concentration of the random 
variable det^(Ai/2 X) around its mean. For general matrices A with 
non-negative entries, Barvinok showed that using standard Gaussian 
variables Xij, with probability approaching one, the resulting multi- 
plicative error is at most exponential in n, with sharp constant. (The 
constant cannot be improved, as the example of A being the identity 
matrix shows.) 

For restricted classes of matrices, better performance is possible. 
Thus, in [6j, the authors analyzed a variant of the Godsil-Gutman 
algorithm due to [H] and showed that for certain dense, random — 1 
matrices, a multiplicative (1 + e) error is achieved in time 0{nu{n)e~'^). 
In [5], it is shown that for a restricted class of non-random matrices, 
the performance achieved by the Barvinok-Godsil-Gutman estimator 
is better than in the worst-case scenario. Indeed, if for some fixed 
constants a,(3 > one has a^j G then for any 6 > 0, with G 

denoting the standard Gaussian matrix. 
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uniformly in A; that is, for such matrices this estimator achieves subex- 
ponentional (in n) errors, with (9(n^) running time. An improved anal- 
ysis in presented in P], who show that the approximation error in the 
same set of matrices is only exponential in n'^^^ log n. 

The class of matrices considered in [5] is somewhat restricted - first, 
it does not include incidence matrices of non-trivial graphs, and sec- 
ond, for such matrices, as noted in a polynomial error determin- 
istic algorithm with running time 0{n^) is available by adapting the 
algorithm in [12]. Our goal in this paper is to better understand the 
properties of the Barvinok-Godsil-Gutman estimator, and show that 
in fact the same analysis applies for a class of matrices that arise from 
{S, K)-strongly connected graphs, i.e. graphs with good expansion prop- 
erties (see Definition 12. II for a precise definition). Our first main result 
concerning permanent estimators reads as follows. 
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Theorem 1.1. There exist C, C, c depending only on 6, k such that for 
any r > 1 and any adjacency matrix A of a {6, k) -strongly connected 
graph, 

P (|logdet2(Ai/2 0G) -Elogdet2(Ai/2 0G)| > C{Tn log 71^/^) 
(1.2) 

< exp(— r) + exp [^—c^pnj logn) . 

and 

Elogdet2(/li/2 G) < log per (A) < Elog det^(Ai/2 G)^C'^n\ogn. 

For a more refined probability bound see Theorem 17.11 Combining 
the two inequalities of Theorem II. H we obtain the concentration of the 
Barvinok-Godsil-Gutman estimator around the permanent. 



P 



log- 



Corollary 1.2. Under the assumptions of Theorem \l.l\ 

det '(Ai/2 G) 
per(A) 

This corollary implies the uniform convergence in probability if we 
consider a family of (5, K)-strongly connected n x n bipartite graphs 
with n — )■ oo. 



> 2C' \/nlogri^ < exp (^—c^/n/ logn) 



Corollary 1.3. Let SCs^K,n denote the collection of adjacency matrices 
of {6, K,)- strongly connected n x n bipartite graphs. Let {t„}^;^ be a 
sequence of positive numbers such that r„ — > oo. Set s„ = Tny/n logn. 
Then for any e > 0, 



;i.3) 



lim sup 



P 



1 



log 



per(y4) 



> 6 



0. 



We remark that the error estimate (11. Sp in Corollary 11.31 is probably 
not optimal. Indeed, in the special case Aij = 1, a consequence of the 
distributional results concerning matrices with i.i.d. Gaussian entries 
[19j, see also p], is that (II. 3p holds with s„ satisfying s„/logn — )■ oo. 
As Theorem 11.11 shows, the main source of error is the discrepancy 
between Elogdet^(v4i/2 G) and logEdet^(v4i/2 G). 

Our second main result pertains to graphs whose adjacency matrix 
A satisfies per(74) > 0. For such matrices, there exists a (polynomial 
time) scaling algorithm that transforms A into an (almost) doubly sto- 
chastic matrix, see [12l Pages 552-553]. In particular, there exists a 
deterministic algorithm (with running time 0{n^)) that outputs non- 
negative diagonal matrices Di,D2 so that B = D1AD2 is an approx- 
imately doubly stochastic matrix, i.e. ^ [V^, 2], -Bjj G 
[1/2,2]. (Much more can be achieved, but we do not use that fact.) 
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Since per(y4) = per(i?) ■ Y[i{Di{i,i)D2{i,i)), evaluating per(y4) thus 
reduces to the evaluation of per(S). The properties of the Barvinok- 
Godsil-Gutman estimator for such matrices are given in the following 
theorem. 

Theorem 1.4. Let r > 0. There exist c, C, C depending only on r, S, k 
with the following property. Let < bn < n be a given sequence. Let B 
be an n X n matrix with entries < bi^j < &„/n such that 



< 1 



for all j G [n] 



bij < 1 for all i E [n\. 

Define the bipartite graph T = Tb connecting the vertices i and j when- 
ever bij > r/n, and assume that T is {S, k,)- strongly connected. Then 
for any r > 1 



P 



I log det^ {Bi/2 G) - E log det^ {B1/2 Q G)\ > C(r6„n) log"' 



(1.4) 

< exp(— r) + exp [—c^/n/ \og^ n) 
and 
(1.5) 

Elogdet^(5i/2 QG) < log per (5) < Elogdet^(Si/2 G)+C' ^/Kn log"' n. 

As in Theorem II. 1^ we can derive the concentration around the per- 
manent and the uniform convergence in probability. 



Corollary 1.5. Under the assumptions of Theorem I.4 
det'^iB 
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QG) 



> 2G' \/bnn log" ri^ < exp (^—c^/n/ log" nj . 



per(i?) 

Corollary 1.6. Let GSCc,s,K,n denote the collection of n x n matrices 
B with properties as in Theorem \1.4\ Then there exists a constant 
C = C{c, 6, k) so that with s„ = {nbn log*" n)-*^/^, and any e > 0, 



lim sup P 



log 



det' (51/2 6-) 



per(i?) 



> 6 



0. 



Corollary 11.51 applies, in particular, to approximately doubly sto- 
chastic matrices B whose entries satisfy c/n < bij < 1 for all i,j. For 
such matrices the graph Ta is complete, so the strong connectedness 
condition is trivially satisfied. Note that if such matrix contains entries 
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of order then the algorithm of [l2] estimates the permanent with 

an error exponential in n. In this case, hn = Q{n), and Corollary 11.51 
is weaker than Barvinok's theorem in [2]. This is due to the fact that 
we do not have a good bound for the gap between Elogdet^(i?i/2 © G) 
and log per (_B), see (ll.Sp . However, this bound cannot be significantly 
improved in general, even for well-connected matrices. As we show in 
Lemma 17. 3^ the gap between these values is of order Q{n) for a ma- 
trix with all diagonal entries equal 1 and all off-diagonal entries equal 
c/n. For such a matrix, the Barvinok-Godsil-Gutman estimator will 
fail consistently, i.e., it will be concentrated around a value, which is 
exp(cn) far away from the permanent. Thus, we conclude that for al- 
most doubly stochastic matrices with a strongly connected graph the 
Barvinok-Godsil-Gutman estimator either approximates the perma- 
nent up to exp(o(n)) with high probability, or yields exponentially big 
error with high probability. 

As in [S] , Theorems 11.31 and 11.61 depend on concentration of linear 
statistics of the spectrum of random (inhomogeneous) Gaussian ma- 
trices; this in turns require a good control on small singular values of 
such matrices. Thus, the first part of the current paper deals with 
the latter question, and proceeds as follows. In Section [2] we define 
the notion of strongly connected bipartite graphs, and state our main 
results concerning small singular values of Gaussian matrices. Theo- 
rems 12.31 and 12. 4t we also state applications of the latter theorems to 
both adjacency graphs and to "almost" doubly stochastic matrices, see 
Theorems r2.5l and l2.7[ Section [3] is devoted to several preliminary lem- 
mas involving e-net arguments. In Section H] we recall the notion of 
compressible vectors and obtain estimate on the norm of Gaussian ma- 
trices restricted to compressible vectors. The control of the minimal 
singular value (that necessitates the study of incompressible vectors) 
is obtained in Section [5l while Section [6] is devoted to the study of 
intermediate singular values. In Section [71 we retun to the analysis of 
the Barvinok-Godsil-Gutman estmator, and use the control on singular 
values together with an improved (compared to [S]) use of concentra- 
tion inequalities to prove the applications and the main theorems in 
the introduction. 

Acknowledgment We thank A. Barvinok and A. Samorodnitsky for 
sharing with us their knowledge of permanent approximation algo- 
rithms, and for useful suggestions. 
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2. Definitions and results 

For a matrix A we denote its operator norm by \\A\\, and set = 
max \aij\. By [n] we denote the set {1, . . . ,n}. By [t\ we denote the 
integer part of t. 

Let J C [m]. Denote by R'^ and S"^ the coordinate subspace of M"* 
corresponding to J and its unit sphere. 

For a left vertex j G [m] and a right vertex i G [n] of a bipartite 
graph r = {[m], [n], E) we write j — t- z if j is connected to i. 

Definition 2.1. Let 5, k > 0, 6/2 > n. Let F be an m x bipartite 
graph. We will say that F is {6, K)-strongly connected if 

(1) deg(i) > 6m for all i G [n]; 

(2) deg(j) > 6n for all j G [m]; 

(3) for any set J C [m] the set of its strongly connected neighbors 
/(J) = {i G [n] I j — )• i for at least [{6/2) ■ \J\\ numbers j G J} 

has the cardinality > min ((1 + J|, n). 

We fix the numbers 6, k and call such graph strongly connected. 
Property (3) in this definition is similar to the expansion property of 
the graph. In the argument below we denote by C, c, etc. constants 
depending on the parameters 6, n and r appearing in Theorems 12.31 and 
12.41 The values of these constants may change from line to line. 

Although condition (3) is formulated for all sets J C [m], it is enough 
to check it only for sets with cardinality \ J\ < (1 — 6/2)m. Indeed, if 
\J\ > (1 — 5/2)m, then any i G [n] is strongly connected to J. 

Definition 2.2. Let A be an m x matrix. Define the graph F^ = 
([m], [n], E) by setting j ^ i whenever aj^i ^ 0. 

We will prove two theorems bounding the singular values of a matrix 
with normal entries. In the theorems, we allow for non-centered entries 
because it will be useful for the application of the theorem in the proof 
of Theorem 12.71 

Theorem 2.3. Let W be an n x n matrix with independent normal 
entries Wij ~ N{bij,a'fj). Assume that 

(1) Qij G {0} U [r, 1] for some constant r > and all i,j; 

(2) the graph Ta is strongly connected; 

(3) \\EW\\ < Ky/n for some K >1. 

Then for any t > 

^{sn{W) < ctK-^n-^'^) < t + e-'^'". 
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Theorem 2.4. Let n/2 < m < n — 4, and let W be an n x m matrix 
with independent normal entries Wij ~ N{bij,afj). Assume that 

(1) aij G {0} U [r, 1] for some constant r > and all i,j; 

(2) the graph is strongly connected; 

(3) IIEW^II < Ky^. 

Then for any t > 

P (^sUW) < ctK-^ ■ ^^^^ < + 

In theorems 12.31 12.41 we assume that the graph F^ is strongly con- 
nected. This condition can be relaxed. In fact, property (3) in the 
definition of strong connectedness is used only for sets J of cardinality 
\J\ > (r^(5/6)m (see Lemmas 14. II and 14.21 for details). 

We apply Theorems 12.31 and 12.41 to two types of matrices. Consider 
first the situation when the matrix A is an adjacency matrix of a graph, 
and EW = 0. 

Theorem 2.5. LetT be a strongly connected nxn bipartite graph, and 
let A be its adjacency matrix. Let G be the nxn standard Gaussian 
matrix. Then for any t > 

F {sn{A QG)< ctn~^'^) < t + e-'^'", 

and for any n/2 < m < n — 4 

P (s^A QG)<Ct- < tin-m)/i ^ g-c'n_ 



n 

Theorem 12.51 is also applicable to the case when F is an unoriented 
graph with n vertices. In this case we denote by A its adjacency matrix, 
and assume that the graph F^ is strongly connected. 

Remark 2.6. With some additional effort the bound m < n — 4 in 
Theorem 12. 51 can be eliminated, and the term fi'^-'^)/^ in the right hand 
side can be replaced with ^"-"^+1. 

The second application pertains to "almost" doubly stochastic matri- 
ces, i.e. matrices with uniformly bounded norms of rows and columns. 

Theorem 2.7. Let W be an n x n matrix with independent normal 
entries Wij ~ A^(0, afj)- Assume that the matrix of variances 
satisfies the conditions 

(1) Er=i o-lj < C for any j G [n], and 

(2) E;=i«M <C/oran?/zGH. 
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Consider an n x n bipartite graph T defined as follows: 

c 2 
z — )■ J, whenever — < a- •, 
n '■' 

and assume that T is strongly connected. Then for any t > 

^{Sn{W) <ctn-^ \og~^' n) < t + exp(-C log%), 
and for any n/2 < m < n ~ A 

P (sUW) < ct ■ ""'T" ] < )/4 + exp(-Clog^n). 
V n log n J 

Note that the condition on the variance matrix in Theorem 12.71 does 
not exclude the situation where several of its entries af are of the order 
Also, exp(— Clog^n) in the probability estimate can be replaced 
by exp(— Clog'^^n) for any p. Of course, the constants C,C',c would 
then depend on p. 

3. Matrix norms and the e-NET argument 

We prepare in this section some preliminary estimates that will be 
useful in bounding probabilities by £-net arguments. First, we have the 
following bound on the norm of a random matrix as an operator acting 
between subspaces of M". This will be useful in the proof of Theorem 

Lemma 3.1. Let A be an n x n matrix with ll^llo^ < 1, and let G be 
annxn standard Gaussian matrix. Then for any subspaces E,F G M"" 
and any s > 1, 

P(||Pf(A0G) : E M"|| > cs{VdimE + VdimF)) 

< exp(-Cs^(dimE + dimF)), 

where Pp is the orthogonal projection onto F. 

Proof. When ajj = 1, the lemma is a direct consequence of the rota- 
tional invariance of the Gaussian measure, and standard concentration 
estimates for the top singular value of a Wishart matrix [161 Propo- 
sition 2.3]. For general A satisfying the assumptions of the lemma, 
the claim follows from the contraction argument in e.g. pT[ Lemma 
2.7], since the collection of entries {gij} so that \\A Q G : E ^ F\\ < 
cs(a/ dimE + y/ dimF)) is a convex symmetric set. We give an alter- 
native direct proof: let A'^j = Jl — Afj, and note that G equals in 
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distribution A Q Gi + A' Q G2 where Gi, G2 are independent copies of 
G. On the event 

Ai ■= |||Pf(A0Gi) : E > cs(VdimE + VdimF)| , 

there exist unit vectors vqi G F,wg^ E E so that \vq_^A © GiWgJ > 
cs{\/dimE + a/ dimF). On the other hand, for any fixed f , v^A' © 
G2W is a Gaussian variable of variance bounded by 1, and hence the 
event 

A2{v, w) := I \v ^A' © G2w\ > cs (VdiraE + VdimF)/2 \ 
has probabihty bounded above by 



exp(-Cs^(VdimE + VdimF)^) < exp{-Cs'^{dimE + diniF)). 
The proof is completed by noting that 

P(A) < Er{A2{vG„WG,) I ^1)) 

+¥{\\PfG : E > cs{VdimE + VdimF)/2) . 

□ 

To prove Theorem 12.71 we will need an estimate of the norm of the 
matrix, which is based on a result of Riemer and Schiitt 1141. 



Lemma 3.2. Let A be an n x n matrix satisfying conditions (1) and 
(2) in Theorem 2/?_. Then 

P(||A©G|| > Clog^n) < exp(-Clog^n). 

Proof. Write X = AqG. By IH Theorem 1.2], 

(3.1) E||A©G'||<C(log3/2n)E(max ||(X,,,)^=ii|2 + ||(X„-)r=ill2). 

1 = 1,. ..,71 

Set rji = ||(Xij)^^j2, i = l,...,n and Aj = Yl]j=i(^lj < C. Define 
(3.^. = alj/Ai < 1. For 9 < 1/4C one has that 

n 

logEe^"' = -- ^ log(l - 2A,/A,) < ce , 
i=i 

for some constant c depending only on C. In particular, the inde- 
pendent random variables r]i possess uniform (in i,6,n) subgaussian 
tails, and therefore, E maxj=i^..._„ 77^ < c'(logn)^/^. Arguing similarly 
for E(maxj=i^ ^„ II (-^i.jOlLilh)) ^^d substituting in fl3.ip . one concludes 
that 

E||A©G|| < Clog^n. 
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The lemma follows from the concentration for the Gaussian measure, 
since F : — M, F[B) = \\AQ B\\ is a 1-Lipschitz function, see e.g. 

m- □ 

Throughout the proofs below we will repeatedly use the easiest form 
of the e-net argument. For convenience, we will formulate it as a sep- 
arate lemma. 

Lemma 3.3. Let V be a n x m random matrix. Let C C S"^~^ be a 
set contained in an l-dimensional subspace o/M*". Assume that there 
exists e > such that for any x E C 

F{\\Vx\\2 < Ey/n) < p. 

Denote by the a-neighborhood of C in M"*. Then 

(6K \ ' 
— j-P- 

Proof. Let A/" C £ be an (e / {4K))-net in C. By the volumetric esti- 
mate, we can choose J\f of cardinality 

Assume that there exists y G Ce/{4,K) such that ||1^?/||2 < (^/2) ■ y/n. 
Choose X E N' for which \\y — xW^ < e/{2K). If \\V\\ < K^fn^ then 

\\Vx\\^ < (e/2)v^+ \\V\\ ■ 4t? < ^V^- 

IK 

Therefore, by the union bound, 
P (3y G C,,iiK) : \\Vy\\ < (£/2)v^ and \\V\\ < K^) 

(QK \ ' 

□ 



4. Compressible vectors 

As developed in details in [151 IlH], when estimating singular values 
it is necessary to handle separately the action of random matrices on 
compressible, e.g., close to sparse, vectors. We begin with a basic small 
ball estimate. 

Lemma 4.1. Let m,n G N. Let A,B be (possibly random) n x m 
matrices, and let W = A Q G + B , where G is the n x m Gaussian 
matrix, independent of A,B. Assume that, a.s., 
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(1) ttij G {0} U [r, 1] for some constant r > and all 

(2) the graph Fa satisfies deg{j) > Sn for all j e [m] . 

Then for any x e S"""^ , z e R" and for any t > 
¥{\\Wx-z\\^ < t^) < {Cty''. 

Proof Let x G S"'-\ Set I = {i e [n] \ E7=i«La^| > Let 
r = r^T be the graph of the matrix A^. The inequahty 

n m m m 

^h^'j - ^'^^^SrUWj > r'^^n Y^^j ^ ^^^^ 
1=1 j=i j=i j=i 

imphes 

(m \ n m 

Y ]^YY - W2 > r^5n/2. 
j=i / i=i j=i 

On the other hand, we have the reverse inequahty 

ie/ \j=i ) \j=i / 

and so |/| > r'^6n/2. 

For any i E I the independent normal random variables Wi = X]j=i('^i,j5'i,j+ 
bij)xj have variances at least r'^S/2. Estimating the Gaussian measure 
of a ball by its Lebesgue measure, we get that for any r > 

P {\\Wx- z\\l<T'^{r^6/2f -n) 

< P (^i^i - ^if ^ r2(r25/2) ■ |/| j < (Cr)l^l. 

Setting t — Tr'^5/2 finishes the proof. □ 

We now introduce the notion of compressible and incompressible 
vectors. The compressible vectors will be easier to handle by an e-net 

argument, keeping track of the degree of compressibility. This is the 
content of the next three lemmas in this section. 
For u,v <1 denote 

Sparse(M) = {a; G 5"^"^ | |supp(a;)| < urn}. 

and 

Comp(u, — {x E gm-i I g Sparse(u), — — 
Incomp(ii, v) — S'^~^ \ Comp(it, v). 
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We employ the following strategy. In Lemma 14.21 we show that the 
matrix W is well invertible on the set of highly compressible vectors. 
Lemma 14.31 asserts that if the matrix is well invertible on the set of 
vectors with a certain degree of compressibility, then we can relax the 
compressibility assumption and show invertibility on a larger set of 
compressible vectors. Finally, in Lemma 14. 4[ we prove that the matrix 
W is well invertible on the set of all compressible vectors. This is done 
by using Lemma W?2\ for highly compressible vectors, and extending the 
set of vectors using Lemma 14.31 in finitely many steps. The number of 
these steps will be independent of the dimension. 

Lemma 4.2. Let m,n e N, m < (3/2)n. Let A,B,W be n x m 
matrices satisfying the conditions of Lemma \4-l\ Let K > 1. Then 
there exist constants Cq, Ci, C2 such that, for any 2; G M", 

P {^x G Comp (co, Ci/K^) : 

\\Wx - 2II2 < {ci/K)^ and \\W\\ < < e-^'"". 

Proof. Let c be the constant from Lemma 14.11 Without loss of gener- 
ality, we may and will assume that c < 1. Let t > be a number to 
be chosen later. For any set J C [m] of cardinality \J\ = I = [_cm/3\ 
Lemmas 14.11 and 13.31 imply 

¥{3x G {S-^)t/{4K) : ||W^a;||2 < (t/2)v^ and ||W^|| < Ky^) 

<(—\-{ctr. 



t J 

(Recall that S"^ is the unit sphere of the coordinate subspace of 



corresponding to J.) Since Comp(c/3, t/(4_ft')) C IJ|j|=«('5''')t/(4X)5 the 



union bound yields 
F{3x G Comp(c/3,t/(4is:)) : \\Wx\\ < {t/2)^/E and \\W\\ < Ky/^) 

2(7)-(t-)'-<">°'^(^)"""-<">" 

which does not exceed e~^^/^ provided that t = c" / K for an appropri- 
ately chosen c" > 0. This proves the lemma if we set Cq = c/3, Ci = 
c"/4. □ 

Lemma 4.3. Let m,n & N, n < 2m. Let A,B be (possibly random) 
n X m matrices, and set W = A Q G + B , where G is the standard 
n X m Gaussian matrix, independent of A,B. Assume that, a.s., 

(1) ttij G {0} U [r, 1] for some constant r > and all i,j; 
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(2) the graph F^t is strongly connected. 

Then for any cq and any u,v > 0, such that u > Cq and (1 + k/2)m < 1, 
and for any z G M" 

P (3x G Comp((l + k/2)u, {v/Kf+^) \ Gom^{u,v) : 
\\Wx - z\\r, < cvivjKf^ and \W\ < K^/n) 

where c = c(co, k, S, r) . 

Proof. Let S{u, v) = Sparse((l + k/2)u) \ Comp('u, v). Fix any x G 
and denote by J the set of all coordinates j G [m] such that \xj\ > 
vj ^fm. For any x G S{u,v) \J\ > um, since otherwise x G Comp('U,f). 
Since the graph F^t is strongly connected, this implies that > 
(1 + K)um. 

If i G /(J), then Wi = J2T= 

centered normal random 

variable with variance 

m 2 2 

Hence, for any t > 0, 

P (^llVTx - ;z||2 < tvTU ■ y/&nj < P (^\\Wx - zW^ < tvru ■ ^/5m/2^ 

< P I ^ (U7, - Zi)^ < tWMS/2) ■ |/(J)| I < (ct)l^(^)l < (ct)(i+")"™, 
\*e/(j) / 

where the third inequality is obtained by the same reasoning as at the 
end of the proof of Lemma I4.1[ Let A C [m] be any set of cardinality 
/=[(! + K/2)um\ , and denote $^ = S"^ fl S{u, v). Set e = tvru ■ VS. 
By Lemma [3.3[ 



(UVl I \/ nil 
3x G ($^)e/(4i^) : \\Wx - z\\2 < t ^ and < Ky/^ 

We have 

Comp ( (l + ^) u, ^) \ Comp(M, v) C [j ($^)./(4X). 

\A\=l 
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Therefore, the union bound yields 



^ // tvru ■ 

P I da; G Comp I (^1 + 2 j ^' — 4^ — ) \ ^o^P[u, v) 



< 




^ J \ ^ / ■ tvr ■ ^/6 



< 



V 



(l+K/2)um 



KMm/2 



This does not exceed e ''""^/^ if -^e choose 



V 

Substituting this t into the estimate above proves the lemma. □ 

Lemma 4.4. Let m,n & N, (2/3)m < n < 2m. Let A,B be an n x m 

matrices, and set W = A Q G + B , where G is the standard n x m 
Gaussian matrix, independent of A,B. Assume that 

(1) aij G {0} U [r, 1] for some constant r > and all i,j; 

(2) the graph F^t is strongly connected. 
Then for all zeW 

F (3x G Comp(l - k/2, K~^) : 

\\Wx - z\\^ < K-^^ and \\W\\ < < 6"'='". 



Proof. Set uq = cq, vq = ciK ^, where cq, ci are the constants from 
Lemma 14.21 Let L be the smallest natural number such that 

Mo(l + > 1 - k/2. 

Note that Mo(l + /t/2)^ < (1 - /t/2) ■ (1 + /t/2) < L Define by induction 
vi+i = (vi/K)^^^, where C is the constant from Lemma 14.31 Then 
Vl = K~'-' for some C' > depending only on the parameters S, k and 
r. We have 

Comp(l — k/2,Vl) C Comp(Mo,t'o) U 

L 

|JComp(Mo(l + K/2y,vi) \ CompK(l + K/2y-\vi.i). 
1=1 
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The result now follows from Lemmas 14.21 and I4.3[ □ 

5. Smallest singular value 

To estimate the smallest singular value, we need the following result 
from [T51 Lemma 3.5], that handles incompressible vectors. 

Lemma 5.1. Let W be an n x n random matrix. Let Wi, . . . ,Wn 

denote the column vectors of W , and let denote the span of all 
column vectors except the k-th. Then for every a,b E (0, 1) and every 
t > 0, one has 
(5.1) 

1 " 

P( inf \\Wx\\2<tbn-^/^) <—^F{dist{Wk,Hk) <t). 

k=l 

Now we can derive the first main result. 

Proof of Theorem \2.3[ Set B = KW and let A = {aij), where a^j = 
Vai (wij), so 

W = AqG + B, 

where G is the n x n standard Gaussian matrix. 

Without loss of generality, assume that K > Kq, where > 1 is a 
constant to be determined. Applying Lemma [4.21 to the matrix W, we 
obtain 

P (3x e Comp (co, CiK~^) : 

||W^a;||2 < {4cl/K)^/^ and \\W\\ < Ky/^ < e""^". 

Therefore, for any t > 

P {sniW) < ctK-^n-^''^) < e"'" + P (||W^|| > Ky/^) 

+ P(3x G Incomp (ccCif^^^) : ||W^x||2 < {4:Ci/K)y/n). 

By Lemma [3. H 

P(||iy|| > 2Ky^) <F{\\AqG\\ > Ky^) < e-^^", 

provided that K > Kq with Kq taken large enough, thus determining 
Kq. By Lemma [5TTI it is enough to bound P {dist{Wk, Hk) < at) for all 
A; G [n]. Consider, for example, k = 1. In the discussion that follows, 
let h G S*"^^ be a vector such that h'^Wj = for all j = 2, . . . , n. Then 

dist(iyi,ifi) > \h^Wi\. 

Let A be the (n— 1) xn matrix whose rows are the columns of A^, except 
the first one, i.e. A'^ = {A2, A3, . . . , An). Define the (^ — 1) xn matrices 
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B, W in the same way. The condition on h can now be rephrased as 
Wh = 0. 

Since the graph Ta is strongly connected, the graph T^t is strongly 
connected with slightly smaller parameters and in particular with pa- 
rameters 6/2 and k/2. Since Comp(l — k/2, {2K)~'^') C Comp(l — 
k/4, (2i^)~'"), we get from Lemma l44l applied to W, z = 0, and with 
K replaced by 2K, that 



¥{3he Comp(l - k/2, i2K)-^), Wh = 



<F(3he Comp(l - k/2, {2K) 



Wh 











and 


W 


< 2Ky/^^ + P ( 


W 



< {2K) 



-c 



n 



> 2K^) 



< e 



W 



> 2K^) . 



The last term is exponentially small: 



W 



> 2K^) < > 2K^) < e-^" 

Hence, 

P (3h e Comp(l - k/2, {2K)-^) : Wh = 0^ <€ 

Note that the vector h is independent of Wi. Therefore, 
¥ {dist{Wi, Hi) < t{2K)-^) 



<¥{\h'Wi\ < t{2K)-^, Wh = 0, and h e Comp(l - k/2, {2K) 



+ P {\K'Wi\ < t{2K)-^, Wh = 0, and h ^ Comp(l - k/2, {2K) 



< e 

< e 



mwA\h' Wi\ < t{2K)-^ I h i Comp(l - k/2, {2K) 



-C\ 



+ sup F{yWi\ < ctK- 

u6lncomp(l-K/2,(2ii')-<^) 

Assume that u G Incomp(l — k/2, {2K)^'"). Let J = {j G [n] : \uj\ > 
{2K)-^n-^/^}. Then \J\ > {l-K/2)n. Hence, if J' = {j G [n] : \aij\ > 
rn"^/^, then |Jn J'\ > (5 - K/2)n > {5/2)n. Therefore, u^Wi is a 
centered normal random variable with variance > r'^{2K)-'^^ ■ 5/2, 
and so 



¥{\u^Wi\ < t{2K)-^) < C't. 



This means that 



P (dist(iyi. Hi) < t{2K)-^) < t + e-^^", 

and the same estimate holds for dist{Wj, Hj), j > 1, so the theorem 
follows from Lemma 15.11 
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□ 

6. Intermediate singular value 

The next elementary lemma allows to find a set of rows of a fixed 
matrix with big £2 norms, provided that the graph of the matrix has a 
large minimal degree. 

Lemma 6.1. Let k < n, and let A be an n x n matrix. Assume that 

(1) Qij G {0} U [r, 1] for some constant r > and all i,j; 

(2) the graph Ta satisfies deg{j) > Sn for all j G [n]. 

Then for any J C [n] there exists a set I C [n] of cardinality 

\I\ > {r^6/2)n, 

such that for any i G I 

J2alj>{r'S/2).\J\. 
Proof. By the assumption on A, 

i=i jeJ 

Let 7 = {i G [n] I E^eJ^i ^ ' l>^l/2}- Then 



2 



□ 



We also need the following lemma concerning the Gaussian measure 
in W. 

Lemma 6.2. Let E,F be linear subspaces o/M". Let Pe,Pf be the 
orthogonal projections onto E and F, and assume that for some t > 0, 

\/yeF, \\PEyh>r\\y\\,. 

Let qe be the standard Gaussian vector in E. Then for any t > 



^{\\PF9Eh<t)< 



J \ dimF 

ct 



■ V dimF 



Proof. Let Ei = PeF. Then (because r > 0), the linear operator 
Pe : F ^ El has a trivial kernel and hence is a bijection. Denote by qh 
the standard Gaussian vector in the space C M". Let U : — )■ 
be an isometry such that UE^ = F and UF ^ E^. Then Pp = UPe.U 
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and UgEi has the same distribution as gp- Therefore, integrating over 
the coordinates of qe orthogonal to Ei, we get 

^i\\PF9E\\2<t) <Fi\\UPE,UgEA2<t) 

= Fi\\PEM\2<i)<^i\\9F\\2<t/T). 

The lemma follows from the standard density estimate for the Gaussian 
vector. 

□ 

Let J C [m]. For levels Q > q > define the set of totally spread 
vectors 

(6.1) S^Q ■.= {yeS': < Ivkl < for all k G j|. 



Lemma 6.3. Let 5, p E (0, 1). There exist Q > q > and a,(3 > 0, 
which depend polynomially on 6, p, such that the following holds. Let 
d < m < n and let W he an n ^ m random matrix with independent 
columns. For I C [m] denote by Hj the linear suhspace spanned by the 
columns Wi, i E I . Let J be a uniformly chosen random subset of [n] 
of cardinality d. Then for every £ > 



(6.2) Pf inf \\Wx\\2<ae\ -\ 



. a;£lncomp(5,p) 

< ■ EjP ( inf disiiWz, Hjc) < e). 

Remark 6.4. Lemma [6]3] was proved in [16] for random matrices with 
i.i.d. entries (see Lemma 6.2 there). However, that proof can be ex- 
tended to the general case without any changes. 

Proof of Theorem \2.4\ Set B = KW and let A = (aij), where af j = 
Vai^Wij), so 

w = aqg + b, 

where G is the n x n standard Gaussian matrix. Without loss of gen- 
erality assume that 

(6.3) K<—. 

If this inequality doesn't hold, we can redefine k as the right hand side 
of this inequality, and note that the strong connectedness property is 
retained when k gets smaller. 
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Let C > be as in Lemma I4.4[ Decomposing the sphere into com- 
pressible and incompressible vectors, we write 

_Q n — m 



P SmiW) < ctK 



n 



(6.4) < P( inf \\Wx\\^<ctK-^ ■ ^] 
+P inf \\Wx\\^ < ctK-^ 



x£lncomp(co,ci/C 2^ ^ \/TL 



By Lemma 14.21 the first term in the right side of (16. 4p does not exceed 

e-'=2" + p(||iy|| > 2K^). 



cn 



By Lemma 13711 the last term in the last expression is smaller than e 
if K is large enough. 

To estimate the second term in the right side of (16. 4p we use Lemma 
16.31 Recall that by that lemma, we can assume that q = and 
Q = K^' for some constant C . Then the lemma reduces the problem 
to estimating 

P( inf disiiWz.Hjc) < e) 

for these q, Q and for a fixed subset J C [m] of cardinality 

n — m 



d 



and with a properly chosen e, see (16. 8 p below. 

Since we do not control the norm of the submatrix matrix B corre- 
sponding to J, we will reduce the dimension further to eliminate this 
matrix. Set i^o = BR^ C M", and let F = {Hjc U Hq)^. Then F is a 
linear subspace of M" independent of {Wj, j G J}, and 

(6.5) n — m < dimF < n — m + d < 2{n — m) . 

Since PpBR-^ = {0}, we get 

(6.6) 

P (3^ G SIq : dist{Wz, Hj.) < e) <R {3z e S^q : \\PfWz\\^ < e) 
= ¥{3zeS'lQ: \\PF{AQG)z\\,<e) 
for any e > 0. 

We start with bounding the small ball probability for a fixed vector 
z G SgQ. The i-th coordinate of the vector {A Q G)z is a normal 
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random variable with variance 

2 \^ 2 2 \ y \^ 2 

Let / C [n] be the set constructed in Lemma 16.11 Then for any i E I 
we have <Ji > cq = c'K~'^ . Let E be the subspace of M" spanned by 
the vectors e^, z e /. 

Since Pe{,A (7)^ and Pex(A G)z are independent Gaussian vec- 
tors, 

(6.7) 

¥{\\Pf{AqG)z\\^ < e) 

= Ep^^(^oG)P ( \\PfPe{A G)z + PpP^x(A G)z\\^ < £ | P^x(A G)) 

<P(||Pi.Pij(A0G)z||2 <e) < P( II Pi.(7i, II 2 <ci^'''e). 

Here qe is the standard Gaussian vector in E. The first inequahty in 
(16.70 is a consequence of Anderson's inequahty [H Theorem 1], apphed 
to the convex symmetric function f{x) = l\ix\\2<e cind the Gaussian 
random vector PpPe{A G)x. The last inequality in fl6.7p follows 
since Pe{A Q G)z is a vector with independent normal coordinates 
with variances greater than c'K~'-' . 

Now we have to check that the spaces E and F satisfy the conditions 
of Lemma 16.21 with high probability. Let A, B, G, and W he {m — 
d) X n matrices whose rows coincide with the columns of the matrices 
A,B,G, and W corresponding to the set J'^. Then the condition F _L 
spajLi{Wj, j G J'^) can be rewritten as P C KeT{W). By Lemma 14.41 
and (1631) . 

P (P n 5""^ ^ lncomp(l - r^6/A, K'^)) 

{^xe Comp(l - k/2, K~^) ■.Wx = 0J < e-"^"" . 

Assume that P fl 5""^ C lncomp(l — / A, K~^). Since dimP = 
|/| > (r^5/4)n, the incompressibility means that for any y & F Cl 
S*""^, ||PEt/||2 > T = K~'" . Hence, by (16. 7p and Lemma 16.21 for 

P ( ||Pp(A G)z||2 < e and P n S"^-^ C Incomp(l - r^S/A, R-^)) 

dK^ e \ ^ I c'K^ e \ 
T\/n — m ) ~ \\/n — m J 

By Lemma 13.11 and (16.51) , 

P {\\Pf{A G) : M-^ ^ > Got-^^^V^^T^) < e-*"'("-'"). 
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Let 

V 



2Co\/n — m 

By the volumetric estimate we can find an r}-net N'^ C of cardi- 
nality 



W'\ < (^^ ) • 



For T) chosen above we have 



P (3^ e Af'' : \\Pf{A G)z\\2 < e and Fn^""^ C lncomp(l-r25/4, K' 

j]J \y/n -mj ~ \^/t^/n - m, 
This does not exceed jf set 



(6.8) e = cK-^^/n - m ■ t. 

Assume now that 

• VzeX^ \\Pf{AqG)z\\^ > e; 

• Fn -5"-^ C Incomp(l - R-^); 

• \\Pf{A QG):R^ ^W\\< Cot-^/^V^fT^. 

The previous proof shows that these conditions are satisfied with prob- 
ability at least 



I _ ^{n-m)/4 _ g-csn _ g-t ^(n-m) > ]^ _ 2f('^~"^)/^ 



-can 



Let z' e Sg Q and let z e jV"^ be an 77- approximation of 2;': — < 
77. Then, on the event above, 

\\Pf{A G)z'\\2 > \\Pf{A G)z\\^ - \\Pf{A G) : ^ ■ r/ 

> £ - Cor^^Wn - m ■ 7] > e/2. 

We thus have proved that 
P (3^ e S^Q : \\Pf{A G)z\\2 < cR-^^/n - m ■ t) 

< 2fin-m)/4 ^ g-C3n_ 
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Combining this with f l6.2p . f l6.6p . and (16 .Sp we obtain 

f( inf \\Wx\\^<a-ctK^^ -"^^^^^ 

< P'^ ■ max P (3^ e S^Q : dist{W z,Hjc) < a ■ ctR-^ ■ - m) 

JC[n], \J\=d ' 

<f3'^- max P (3^ G S^q \\Pf{A G)z\\^ < a ■ ctR-^ ■ y/n - m) 

JC[n], \J\=d 

Recall that d = [_{n — m)/2j, a = R~^, and /3 = R'^. Replacing t by 
P'^t in the inequality above to eliminate the coefficient in the right 
hand side, we complete the proof of the theorem. 

□ 

7. Applications 

7.1. Singular value bounds. The bound on the smallest singular 
value in Theorem 12 . 51 follows immediately from Theorem 12 . 3 1 and Lemma 
13.11 To bound Sm{A G) we apply Theorem 12.41 to the matrix W 
consisting of the first m columns oi A Q G and note that Sm{W) < 
s^AqG). 

To prove Theorem 12. 7[ decompose the matrix W by writing W = 
_|- 1^(2) where W^^^ and W^'^^ are independent centered Gaussian 
matrices with independent entries and 

Let VL be the event || W^*-^'' || < C' log^ n. By Lemma [3^ for appropriate 
constants C", C", one has 

P(0'=) < P(||W^|| > C'log^n) < exp(-C"log^n). 

On the other hand, 

P {sn{W) < ctn'^ log"^' n and Q) 

< E^^2)F + < ctn~^ log'^' n \ n) 

< sup P (s„(iy(^) +X) < ctn-^ log""' n). 

X:i|X||<C" log^n 

By Theorem 12.31 applied to ^/nW^^^ and B = y/nX with R = C log^ n, 
the last probability is at most t+e~'^". The second estimate in Theorem 
12.71 is proved by the same argument. 
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7.2. Permanent estimates. We turn next to the proof of the the- 
orems in the introduction. We begin with a refinement of Theorem 

Theorem 7.1. There exist C, c, c' depending only on 6, k, such that for 
any r > 1 and any adjacency matrix A of a {6, k,)- strongly connected 
graph 

(7.1) P (^|logdet2(Ai/2 G) - Elogdet2(Ai/2 G)| > ^(rnlogn)^/^) 
< 6 exp(-r) + 3 exp (^-cr^^^n^^^ log^^^^ n j + 9 exp (-c'n) . 

and 
(7.2) 

Elogdet^(Ai/2 G) < log per (A) < Elogdet2(Ai/2 G)+C'^n\ogn. 

Theorem 11.11 follows from Theorem 17.11 since the right side of (17.11) 
does not exceed 9exp(— r) + 12exp(— CA/n/logn). The coefficients 9 
and 12 can be removed by adjusting the constants C and c'. 

Proof. The proof of Theorem 17.11 is partially based on the ideas of O 
Pages 1563-1566]. We would like to apply the Gaussian concentration 
inequality to the logarithm of the determinant of the matrix A1/2 G, 
which can be written as the sum of the logarithms of its singular values. 
However, since the logarithm is not a Lipschitz function, we will have 
to truncate it in a neighborhood of zero in order to be able to apply 
the concentration inequality. This truncation is introduced in Section 

OH 

The singular values will be divided into two groups. For the large 
values of n — Z we use the concentration of the (sums of subsets) singular 
values Sn~i{Ai/2 G) around their mean. In contrast to |5], we do not 
use the concentration inequality once, but rather divide the range of 
singular values to several subsets, and apply separately the concentra- 
tion inequality in each subset. The definition of the subsets, introduced 
in Section 17.2. H will be chosen to match the singular values estimates 
of Theorem 12.41 

On the other hand, when n — I becomes small, the concentration 
doesn't provide an efficient estimate. In that case we use the lower 
bounds for such singular values obtained in Theorem 12. 3[ Because 
the number of singular values treated this way is small, their total 
contribution to the sum of the logarithms will be small as well. This 
computation is described in Section I7.2.2[ 

Getting rid of the truncation of the logarithm requires an a-priori 
rough estimate on the second moment of logdet^(Ai/2 G), which is 
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presented in Lemma [72] and proved in Section [731 With this, we arrive 
in Section 17.2.3^ to the control of the deviations of logdet^(yli/2 G) 
from Elog det^(Ai/2 © G) that is presented in fl7.ip . 

To complete the proof of the Theorem, we will need to relate 
Elogdet^(y4i/2 G) to logEdet^(y4i/2 Q G) = perm(A). This is achieved 
in Section 17.2.41 by again truncating the log (at a level different than 
that used before) and employing an exponential inequality. 

7.2.1. Construction of the truncated determinant. Let A;^, G N be a 
number to be specified later. We choose truncation dimensions and 
the truncation levels for large codimensions first. For A; = 0, . . . , A;^, 
set 

nk = n- 2-4^ 
4 = v^-2'=+'=*; 

I — r>— 4fr 

'n 



Here, Cq is a fixed constant to be chosen below. We also set l^, = n^^. 
For any n x n matrix V define the function f{V) by 

fiy) = Y,Uy)^ where /,(l-)= ^ log,^(s„_,(l^)), 

fe=l l=n—nf^_i 

where \og^{x) = \og{x V e). Recall that the function S : M"^ — ?■ R" 
defined by S{V) = {si{V), . . . , SniV)) is 1-Lipschitz. Hence, each func- 
tion fk is Lipschitz with Lipschitz constant 

L, < ^^^-^ ~ < c' ■ 

Denote W = A1/2 G. The concentration of the Gaussian measure 
implies that for an appropriately chosen constant C, one has 

P ilMW) - EMW)\ > Ctk) < 2exp (^-||^ < 2exp {-2'(>^'-^h) . 

(For this version, see e.g. fi3[ Formula (2.10)].) Therefore, 
(7.3) 

\f{W)-¥.f{W)\ >C^tA < 2^exp (-22('^*-'^V) < Ae-\ 

k=l J k=l 

k=i k=i V 
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We similarly handle singular values for / > n — Define the 
function g{V) = J2^=n-rn, ^'^&£k i^n~i(y)), whose Lipschitz constant is 
bounded by y/h/ek* = Cq^^Ju/I^, and therefore 



(7.4) 
Set 

Define 



P {\g{W)-¥.g{W)\>c,^- < 26"^ 
e{l) = 



Sk, I e[nk + l,nk-i] 



n-1 
1=0 

We include as the second argument to emphasize the dependence 
on the truncation level. From (17.31) and (17. 4p . we obtain the large 
deviation bound for the logarithm of the truncated determinant: 



(7.5) P(|logdet(W^,/*) -Elogdet(iy,/*)| > cav^Vn//,) < 6e-\ 

7.2.2. Basic concentration estimate for logdet^(Vr). Our next goal is 
to get rid of the truncation, i.e., to relate det(iy, h) to det^(W^). Toward 
this end, define the set of n x n matrices Wi as follows: 

Wi = {V\3k, l<k< K, Sn_n,{V) < Ek}. 

Then by Theorem 12.41 

k=i ^ "^^ ^ 

with an appropriate choice of the constant Cq. 

For codimensions smaller than = Uk, we simply estimate the total 
contribution of small singular values. For < Z < set 

u 1 

Let W2 be the set of n x n matrices defined by 

Applying Theorem 12.31 for < / < 4 and 12.41 for 4 </</=„, we obtain 

3 u 1/4 

P (ly G W2) < 5^ ■ + ^ ( c Y ■ dn-i ) + l)e-''' 

1 = 4: ^ 

< Ch ■ n~W7. < CU ■ exp(-/*/4) < exp(-/*/8). 
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Assume that V ^ W2. Then 



1=0 1=0 

Let W3 denote the set of all n x n matrices V such that > n. 
Then F {W e W3) < e"". If ^ W3, then 

h 

^^hgSn^iiV) < IJogn. 

1=0 

Therefore, for any V G (W2 U W3Y, 
^ loff n. < 



3 

logra < ^logs„_,(K) < ^\og{sn-i{V) V^fcJ < /=,logn. 



1=0 1=0 
We thus obtain that if W e (Wi U W2 U W3Y then 

(7.6) |logdet2(iy) -logckt(PF,/,)| < ^/*logn 

Note that the event W E (Wi U W2 U WsY has probability larger than 
l-3e-^*/^ 
Setting 

QiQ =Elogckt(W^,/,), 
we thus conclude from (17.51) that 

(7.7) P (^|logdet'(W) -Q(/,)| > \ogn + C2^/Tn/l?j 

< 6exp(— r) + 3 exp(— /=i,/8) . 

This is our main concentration estimate. We will use it with l^, depend- 
ing on r to obtain an optimized concentration bound. Also, we will use 
special choices of to relate a hard to evaluate quantity Q(/*) to the 
characteristics of the distribution of det'^{W), namely to Elogdet^(W^) 
and logEdet^(Vr). This will be done by comparing Elogdet^(Ty) to 
Q{li) and logEdet^(W^) to Q{l2) for different values li and 12- This 
means that we also have to compare Q{li) and Qih)- The last com- 
parison requires only (17.71) . 

Let 100 < /i, /2 < n/2. For j = 1, 2, denote 

W, = |v^||logdet2(\/) -Q(/,)| < ^Z,logn + 4c2yQi~|. 
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Using (EZD with r = 16, we show that P (Wj) > 1/2 for j = 1, 2. This 
means that WinW2^ 0. Taking V eWid W2, we obtain 

(7.8) \Q{k) - QM < \Q{k) - logdet2(\/)| + | logdet2(y) - QM 

<^(l, + l,)\ogn + cn'/\l-'/' + 1,'/'). 

7.2.3. Comparing Q{h) to Elogdet^(H^). Our next task is to relate 
Elogdet^(Vr) to Q{h) for some = li. Toward this end we optimize 
the left side of (17. 7p for r = 8 by choosing = Zi, where 

2n^/^ log"^/^ n<h=n- 2-^^^ < ?,2n^'^ log'^/^ n. 

Then we get from (17. 7p that there exists c > such that for all r > 1, 

(7.9) P (|logdet2(iy) - Q{h)\ > cT^/\nlogny/^) 

< 6 exp(-r) + 3 exp(-/i/8) . 

Let W4 be the set of all n x n matrices V such that | logdet(V)^ — 
Q{li)\ > \fn. The inequality (17. 9p applied with r = dl\ for an appro- 
priate d reads 

(7.10) P (ly G W4) < exp (-c/i) = exp [-Cn^l^ log"^/^ . 
We have 

|Elogdet2(iy) - g(/i)| < E| logdet2(Vr) - Q{lx)\ 
=E| logdet2(W^) - g(/i)| ■ lw|(W^) + E| logdet2(iy) - Q(/i)| ■ ly.X'^). 
The first term here can be estimated by integrating the tail in (17. 9^ : 
E|logdet2(iy)-Q(/i)|.lw|(iy) 

< c(nlogn)^/^ + /""^ P(|logdet^(Vr) -Q(/i)| > x)da; 

J c{n log n)-'-/'^ 

< c(nlogn)^/=^ + I 2exp I - f— — ^— — ^ ) c/x < ^(n logn)^/^ 

J\ \ \c[n\ognYl^ ) j 

To bound the second term, we need the following rough estimate of the 
second moment of the logarithm of the determinant. The proof of this 
estimate will be presented in the next subsection. 

Lemma 7.2. Let W = G Q ^1/2; 'where G is the standard Gaussian 
matrix, and A' is a deterministic matrix with entries < Oj.-,- < 1 for 
all i,j having at least one generalized diagonal with entries a'. ^^^^ > c/n 
for all i. Then 

Elog^det^(W) < Cn\ 
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Since A is the matrix of a {6, K)-strongly connected graph, it satisfies 
the conditions of Lemma I7.2[ The estimate of the second term follows 
from Lemma [7. 2[ (17. 9p . and the Cauchy-Schwarz inequality: 

E\\ogdet\W)-Qih)\-ln>,iW) 

< (E| \ogdet\W) - g(/i)|')'^' -F'/^W G W4) 

< {On' + 2Q2(/,))1/2 . exp (^-{C/2)n^/'' \og-^/' . 
Combining the bounds for W4 and W4, we get 

lElogdet'(iy) -Q(/i)| 

< C(nlogn)^/3 ^ ^(5^3 ^ 2Q2(;^))1/2 . (^-[C/2)n^/Hog-^^' nj , 
which implies 

(7.11) lElogdet^(Vr) - Q{k)\ < C'{nlogny/\ 

7.2.4. Com2)arm(/ log Edet^(iy) to E log det^(IV). We start with relat- 
ing Q{li) and logEdet^(Vr) = logperm(74). To this end we will use a 
different value of Namely, choose I2 so that 

^/nJlogn < h = n ■ 2'^^'^ < IQ^n/ \ogn. 

The reasons for this choice will become clear soon. Denote for brevity 

U := logdet(Vr,/2) -Elogdet(Vr,/2). 
We deduce from (17.51) that 

E(e^) < E(el^l)<l+/ e*P (|f/| > t) rft 

POO 

< 1 + 6 / e*e-*''2/"2'^ dt<l + cge"^"/'^ . 



Taking logarithms, we conclude that 

logEdet^(W^) < \ogEdet{W,l2) 

< Elogdet(Vr, I2) + log(l + cse"^"/'^) 

< Q{l2) + c^n/l2. 
The inequality (17.8P implies 

(7.12) logEdet2(iy) 

< Qih) + c^n/k + ^(/i + I2) \ogn + cn^'\l-^'^ + i:,^'^) 



< Q{h) + csa/w log 



n. 
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The value of I2 was selected to optimize the inequality f l7.12p . To bound 
Q(/i) - logEdet^(iy) from above, we use (17.91) with r = 4 to derive 

(7.13) P (I \ogdet\W) - Q{h)\ < 2c{n\ogny/^) 

> 1 _ 6e-4 _ 3e-'^/8 i_ 

2 

On the other hand, Chebyshev's inequality applied to the random vari- 
able det^(iy)/Edet^(iy) implies that 

F{det\W) < 2Edet^(Ty)) > ^, 

and therefore 

(7.14) P(logdet^(Vr) -logEdet^(Vr) < log2) > ^. 

This means that the events in (I7.13P and (I7.14p intersect, and so 

Q(/i) - logEdet^(iy) < 2c(nlogn)^/3 + log 2. 
Together with (17.121) this provides a two-sided bound 

|Q(/i) - logEdet2(iy)| < max [c^^n logn, 2c(nlogn)^/^ + log2) 

= C5A/n logn 

for a sufficiently large n. The combination of this inequality with (17. lip 
yields 

lElogdet^(W^) - logEdet^(W^)| < ce^nlogn. 

7.2.5. Concentration around Elogdet^(iy). To finish the proof we 
have to derive the concentration inequality. This will be done by choos- 
ing the truncation parameter Z^, depending on r. Namely, assume first 
that 1 < T < n"^ log^ n and define l^, by 

2-^1/3^1/3 = 2-4fc. < 2- Vi/3ni/3 log-2/3 n. 

The constraint on r is needed to guarantee that K > 1. Substituting 
this h in (17. 7p . we get 

P (^llogdet^(W^) -Q(/*)| > logn + cav^™//,^ 

< 6exp(-r) + 3exp (-CT^^^n^^^ log-"^^^ n) . 
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By ( 17. lip and (17. Sp . for such r we have 
|Elogdet2(iy) -Q(/,)| 

< |Elogdet='(iy) - Q{k)\ + \Qih) - Qih)\ 

< C"{Tn log ny^\ 
Together with the previous inequahty, this imphes 

P (^|logdet^(iy) -Elogdet^(iy)| > ^(rnlogn)^/^' 

< 6exp(-r) + 3exp (^-cT^^^n^/Hog'^^^ nj , 

if the constant C is chosen large enough. 

If r > tq := n"^ log^ n, we use the inequahty above with r = tq and 
obtain 

P (^|logdet2(iy) -Elogdet^(iy)| > C{Tn log nf^^^ <9exp(-c'n), 
Finally, for all r > 1, this implies 

P f|logdet2(iy) -Elogdet2(iy)| > ^(rn log n)^/^ ' 



< 6 exp(-r) + 3 exp (^-cr^/^n^/^ log"^/^ nj +9 exp (-c'n) . 
which completes the proof of Theorem 17.11 □ 

7.3. Second moment of the logarithm of the determinant. It 

remains to prove Lemma I7.2[ The estimate of the lemma, which was 
necessary in the proof of (11. 2p . is very far from being precise, so we 
will use rough, but elementary bounds. 

Proof of Lemma \7.S\ We will estimate the expectations of the squares 
of the positive and negative parts of the logarithm separately. De- 
note by Wi, . . . , Wn the columns of the matrix W. By the Hadamard 
inequality, 

n n n 

Elog+det(Vr)2 < J]Elog+ \\Wj\\l < n ^ ^ Elog^(l+wJj.) < Cn^ . 

j=l j=l i=l 

Here in the second inequality we used an elementary bound 

n n 
log+(^Mi) < ^log(l + 
i=l i=l 
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valid for all ui, . . . ,Un > 0, and the Caucliy-Schwarz inequality. The 
last inequality holds since Wij is a normal random variable of variance 
at most 1. 

To prove the bound for Elog?. det(l^)^, assume that j > c/n for 

all i G [n]. Set A" = ./^A'^^, so a'/^ > 1, and let W" = A" G. 

Then Elog^ det{W)^ < Elog^ det{W")^ + 2n\ogn. We will prove the 
following estimate by induction: 

(7.15) Elogl det{Wy <c'n\ 

where the constant c' is chosen from the analysis of the one-dimensional 
case. 

For n = 1 this follows from the inequality 

(7.16) Elog^(u;i,i + x) < c', 

which holds for all x G M. Assume that f l7.15p holds for n. Denote 
by El the expectation with respect to gi^i and by E' the expectation 
with respect to G^^\ which will denote the other entries of G. Denote 
by Di i the minor of W" corresponding to the entry (1, 1). Note that 
Di i 7^ a.s. Decomposing the determinant with respect to the first 
row, we obtain 

Elog! det(W^")' = E' (El [log^_(a;;i (71,1^1,1 + Y) \ G^'^]) 



E' El 



(^iog_(a'i;i(7i,i + iog-(^i,i)) I G(^) 

Since Y/Di^i is independent of gi^i, inequality fl7.16p yields 

El (log^- ((a'i',i^?i,i + I < 

Therefore, by Cauchy-Schwarz inequality, 
Elog^ det{Wy 



c. 



< E' l^c' + 2Ei log_ (^a'; ,g,^^ + | G^^) 



log_(Z}i,i) + log2_(D 



1,1, 



< (^v^+i/E'log2_(A,i)) • 



By the induction hypothesis, E' log^(Di,i) < c'n^, so 

Elog! det(l^")' < c'(n + 1)2. 
This proves the induction step, and thus completes the proof of Lemma 
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Theorem 11.41 is proved similarly, using this time Theorem 12.71 in- 
stead of Theorem 12.51 and taking into account the degradation of the 
Lipschitz constant due to the presence of 6„. We omit further details. 

7.4. Concentration far away from permanent. Consider an ap- 
proximately doubly stochastic matrix B with all entries of order Q{n~^), 
which has some entries of order For such matrices the conditions 

of Theorem 11.41 are satisfied with 6,k = 1, so the Barvinok-Godsil- 
Gutman estimator is strongly concentrated around E log det^(i?i/2 0G). 
Yet, the second inequality of this Theorem reads 

log per (S) < Elogdet2(5i/2 Q G) + C'nlog"' cn, 

which is too weak to obtain a subexponential deviation of the esti- 
mator from the permanent. However, the next lemma shows that the 
inequality above is sharp up to a logarithmic term. This means, in par- 
ticular, that the Barvinok-Godsil-Gutman estimator for such matrices 
can be concentrated around a value, which is exp(cn) away from the 
permanent. 

Lemma 7.3. Let a > 0, and let B he an n ^ n matrix with entries 

^ ^ fa/n, fori^j; 
''^ \l, fori=j. 
There exist constants ao,/3 > so that if < a < ao then 

(7.17) liminf- |Elogdet(Si/2 0G)^ -logEdet(Si/2 0G)^| > (3. 

n— s>oo n 

Proof. Recall that from (11.41) . we have that for any fixed a < 1, the 
random variable 

- |Elogdet(5i/2 QGf- logdet(5i/2 Q Gf\ 

converges to (in probability and a.s.). Since 

E det(5i/2 Gf = per(S) > 1 , 

it thus suffices to show that, with constants as in the statement of the 
lemma, 

(7.18) lim inf - log det(5i/2 G)^ < , a.s. . 

n— s>oo n 

We rewrite the determinant as a sum over permutations with i fixed 
points. We then have 

det(i?v20G) = y: \in jy. — -e^^' 

e=0 Fc[n],\F\=e \i&F / e=o 



SINGULAR VALUES AND PERMANENT ESTIMATORS 33 

where Mp is the determinant of an (n — £) x (n — t) matrix with 
i.i.d. standard Gaussian entries, EMp = {n — cr{F) takes values 
in {—1, 1} and Mp is independent of YliepGii- (Note that Mp^ is not 
independent of Mp^ for Fi ^ ¥2-) 
Recall that 



(7.19) 




where ln = ^jn and h is the entropy function, h[x) = — a;logx — (1 — 
x) log(l — a;) < log 2. 

We will need the following easy consequence of Chebyshev's inequal- 
ity: for any y > 0, 



e 

(7.20) P(in^-I ^ e-^') ^ (ElGiiDV^ = 

i=l 




It is then clear that there exist 5i, ^2 > so that, for any £„ > (1 — 5i), 
one has 

(7.21) (")p(iriG.|>e-^-)<^. 

Choose now 6[ < 61 positive so that 

(7.22) 62 > 3h{l - 5[) , 

which is always possible since h{-) is continuous and h{l) = 0. 

We will show that we can find ao > such that for any a < aQ, for 
all n large and any i, 

(7.23) P(|A^| > e~^^"/2) < ^. 

This would imply ( I7.18P and conclude the proof of the lemma. 

To see (I7.23p . we argue separately for > {l — 6[) and < {l — 6[). 
In either case, we start with the inequality 

(7.24) F{\Ae\ > e-^2"/2) 
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Considering first > (1 — (^i), we estimate the right side in fl7.24p by 
/ \ / ^ 

(7.25) 




g'52n/2 



The first term in f l7.25p is bounded hj by our choice of parameters, 
see (17.211) . To analyze the second term we use Chebyshev's inequality 
and the fact that a < 1: 




n{l-e„)/2 



Ml 

"(i-^")(n-£)! 



-1 



where the last inequality is due to (I7.22p . This completes the proof of 
fl7:23|) for C > (1 - 6[), for any a < 1. 

It remains to analyze the case £„<(! — 5[)n. This is where the 
choice of ao will be made. Starting from ( 17.24p we have by Chebyshev's 
inequality 

P(|A,|>e-W.) < (';)V"(|)"'"'"'E|M[,|r 

^ ^n(l-£„)g3nlog2 ^np log 2+5'i log a] _ 

Choosing ao < 1 such that 3 log 2 + 6[ log ao < shows that the last 
term is bounded by for large n, and completes the proof of the 
lemma. □ 
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